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SMOOTHING  3-D  DATA  FOR  TORPEDO  PATHS 


I.  THE  GENERAL  PROBLEM 

A.  Data 

Data  in  the  form  of  ordered  quadruplets  ( t . ,  x.,  v  ,  and  z  )  are 

11*1  1 

available  from  3-D  files  on  torpedo  and  target  paths.  The  times  t.  are  suffic¬ 
iently  accurate  so  that  they  can  be  assumed  to  be  without  errors.  The  spatial  co¬ 
ordinates  x  ,  y ^ ,  and  z^,  however,  are  not  only  subject  to  measurement  errors, 
but  also  may  contain  erratic  measurements  or  have  measurements  missing  for  some 
of  the  equally  spaced  time  intervals. 

B.  Desired  Output 

Information  to  be  extracted  from  this  data  can  be  obtained  either  as: 

(1)  smoothed  information  as  a  function  of  time  (parametric  form),  or 

(2)  smoothed  information  at  a  particular  sequence  of  times  which  can  be 

specified. 

A  comparison  of  computational  requirements  of  the  two  procedures  will  involve  the  length 
of  intervals  used  in  smoothing  and  the  number  of  times  in  the  sequence  of  times  of 
interest.  Both  procedures  involve  the  same  smoothing  techniques. 

The  information  to  be  extracted  from  the  3-D  data  includes: 

(1)  smoothed  position  coordinates 

(a)  as  functions  of  time  (i.e.,  x=f  (t),  v=f  (t),  z=t  (t)) 

a  w  y  z 

(b)  at  specified  times  tj  (i.e.,  x(tp,  y(tj),  z(t.)), 
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(3)  velocity  component  estimates 


(a)  as  functions  of  time  (i.e.,  V  (t),  V  (t),  V  (t)) 

a  y  z 

(b)  at  specified  times  t.  (i.e.,  V  (t;),  V  (t->,  V  (t.)), 

I  1  Vi  Z*  1 

(4)  relative  torpedo  and  target  geometry  in  vicinitv  of  intercept. 

C.  Data  Sample 

The  path  of  the  torpedo  involves  maneuvers  so  that  segments  must  be 
selected  for  applications  of  the  smoothing  technique.  The  lengths  of  the  segments,  and 
hence  the  number  of  possible  data  points,  is  open  to  selection.  Curves  to  be  used  to  fit 
the  data  will  orimarily  be  polynomials.  Longer  path  segments  will  generally  require  higher 
order  polynomials  and  be  more  difficult  to  fit  with  acceptably  small  residuals.  On  the 
other  hand,  short  intervals  contain  fewer  data  points  and  can  limit  capability  for  reducing 
prediction  errors— the  trade-off  must  be  resolved  by  considering  potential  paths,  and 
measurement  errors.  Some  indication  will  be  presented  in  subsequent  sections  of  this 
report  where  data  for  a  specific  torpedo  path  is  analyzed.  Initially,  two  sample  sizes 
(n=ll  and  n=2l)  are  considered. 

One  of  the  questionable  features  for  small  sample  sizes  is  possible  further 
reduction  by  deletion  of  data  points  which  appears  inconsistent  with  the  remaining  data. 
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II. 


DATA  SMOOTHING 


A. 


Methodology 


The  data  smoothing  considered  in  this  report  is  limited  to  the  method  of 
least  squares.  Other  methods  such  as  Kalman  filtering  would  be  appropriate  for  real  time 


data  smoothing  where  interest  is  centered  on  the  next  data  point  following  the  data  used 
in  the  smoothing,  but  the  current  status  of  the  method  is  not  appropriate  for 
post  experimental  application  where  times  within  the  data  sample  are  of  interest. 

The  data  smoothing  techniques  currently  used  at  IVPS  involve  the  least 
squares  method  with  the  following  equations: 

(1)  x(t)  =  a  +  bt  (linear) 


(2)  x(t)  =  a  +  bt  +  ct“  (quadratic,  parabolic) 


(3)  x(t)  =  a  +  bln  (t)  (logarithmic). 

This  report  concentrates  on  the  addition  of  higher  order  polynomials,  in  particular: 


(4)  x(t)  =  aQ  +  ajt  +  a2t+  a^t3  (cubic) 

(5)  x(t)  =  aQ  +  a^t  +  a0t^  +  a^t^  +  a^t4  (quartic). 


The  linear  least  squares  technique  is  described  in  Appendix  A.  The  sum  of 
squares  of  the  residuals 


n 

2 

l 


provides  a  basis  for  selection  of  the  particular  equation  to  be  used  in  fitting  a  particular 
set  of  data.  The  statistic 


=  D/  (n-k), 
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where  n  is  the  number  of  points  in  the  sample  and  k  is  the  number  of  parameters  in  the 
equation,  provides  an  estimate  of  the  variance  of  measurement  errors. 

B.  Sequential  Differences 

A  preliminary  screening  of  sample  data  by  successive  differences  can  serve 
a  dual  purpose: 

(1)  indication  of  the  order  of  the  polynomial  required  to  produce  a 
reasonable  fit,  and 

(2)  indication  of  isolated  wild  data  points  (outliers!. 

The  first  through  fourth  successive  differences  are  presented  in  Table  1  when  the  actual 
relationship  of  x  to  t  is  linear  and  in  Table  2  when  the  relationship  is  quadratic.  A 
perturbation  d  is  introduced  in  x^. 

There  are  several  salient  features  of  successive  differences  that  should  be 

noted: 


(1)  Ignore,  for  the  moment,  the  perturbation  in  x^.  In  Table  1,  the  first 
differences  (the  A  y's)  consist  of  the  velocity  term  a^  plus  noise.  If  a^  is  large  with 
respect  to  the  noise  (the  n-!s),  these  differences  will  all  have  the  same  sign.  The  second 
differences  (the  A  2{'s);  however,  involve  only  noise  and  their  signs  should  be  random. 
This  change  from  consistent  signs  for  the  A  ^'s  to  random  signs  for  the  A  is  an 
indication  that  a  linear  relationship  of  x  to  t  is  appropriate. 


In  passing,  it  should  be  noted  that: 


A, 


1  6 

-  2 

6  1 


A 


li 


=  a1  +  (n5-nQ)/6, 
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Or  2 
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Table  1.  Successive  Differences  -  Linear  Case 
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Table  2.  Sequential  Differences  -  Quadratic  Case 
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and  that  A  ^  is  normally  distributed,  i.e., 

A.  ~  N(a, , _ zl  )• 

18 


It  should  also  be  noted  that  if  a^,  is  not  large  with  respect  to  <r,  the  signs 
of  the  A  j-'s  can  still  have  the  sign  of  a^  with  the  dominance  of  this  sign  depending  upon 
the  relative  magnitudes  of  a^  and  <j  . 


Next,  consider  the  quadratic  case  (Table  2).  The  A  gj's  having  random 
signs  and  the  A  2j's  are  dominated  by  the  sign  of  a2,  and  hence  the  quadratics  are 
indicated  as  the  appropriate  polynomial.  Note  that  the  signs  of  the  A  ^'s  may  also  be 
the  same  for  all  i  if  a^  and  a2  have  the  same  sign.  If  a^  and  a0  have  opposite  signs  and  a^ 

is  greater  than  a0  then  there  can  be  a  change  in  the  sign  of  the  An:'s  where  a,  +  (i  - 

2  L  J 

( i  -1)  )  a _  changes  sign.  In  the  vicinity  of  this  point  the  a's  can  become  significant  and 
L  ) 
produce  some  random  sign  terms. 


Higher  order  differences  are  required  to  deal  with  higher  order  polynomials. 
In  general,  random  signs  in  (k^l)  st  order  differences  and  consistent  signs  in  order 

differences  indicate  selection  of  a  (k+l)st  order  polynomial  to  fit  the  data. 


(2)  The  perturbation  d  was  included  to  provide  an  examination  of  the  effect 
of  an  isolated  outlier  on  successive  differences.  For  illustrative  purposes,  it  will  be 
assumed  that  a  successive  difference  greater  than  three  times  the  standard  deviation  of 
fhe  noise  in  that  difference  will  be  considered  as  an  indication  that  a  perturbation  exists. 
The  value  <*  =4  will  also  be  used  for  illustrative  purposes. 

Now,  note  the  entries  in  the  lower  part  of  Table  1.  Unless  a^  is  known  (or 
estimated)  a  critical  magnitude  for  the  A  ^'s  cannot  be  specified.  For  higher  order 
differences  the  i1^  difference  of  the  order  (  A  j.)  has  a  normal^distribution. 


—  N  (k.d, 
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•  Where  k~  is  the  coefficient  of  d  in  A  jj.  If  d  =  O  then: 

A  „  -  «0,  ^  ). 

The  situation  is  an  application  of  statistical  hypothesis  testing.  If  A  j-  is  larger  than  can 
be  expected  due  to  noise  alone,  then  the  presence  of  a  perturbation  (an  outlier)  is 
indicated.  The  critical  magnitude  using  assumptions  of  l.Q-0.99  =  0.01  as  significance 
level  and  =4  is  presented  in  the  last  row  of  Table  1.  Thus  if  i  A  I  )  17,  |  A  3.  | 

y  18,  or  |  A  4.  |  >  17,  for  any  i,  then  an  outlier  is  indicated. 


Note  that  the  value  «■  =4  was  assumed  for  this  illustration.  If  sequential 
differences  are  used  for  preliminary  screening  before  least  squares  curve  fitting  is 
performed,  the  estimate  Sg  for  *  will  not  be  available.  A  value  of  a  may  be  assumed 
from  prior  information  of  measurement  errors  but  for  purposes  of  preliminary  screening 
some  value  greater  than  4  would  permit  elimination  of  data  points  with  large 
perturbations. 


It  should  be  emphasized  that  the  above  discussion  pertains  to  the  simplest 
situations.  For  applications  where  there  are  missing  data  points,  or  where  perturbations 
are  not  isolated,  more  guidance  will  be  required.  The  assumption  that  the  noise 
components  (the  n-'s)  are  independent  and  have  the  same  variance,  also  warrants 
reservations  in  applications  of  the  models. 
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III. 


APPLICATION 


A.  Sample  Data 

A  specific  test  in  which  a  torpedo  was  launched  against  a  submarine  at  the 
Naval  Undersea  Warfare  Engineering  facilities  will  be  used  for  illustration.  The  3-D  data 
includes  equally  spaced  times  from  814  to  1000— very  few  data  points  are  missing. 
Figure  1  shows  the  torpedo  path  with  every  fifth  point.  Segments  of  this  torpedo  path  are 
selected  for  application  of  the  methodology  presented  in  Section  II.  The  presentation  is 
restricted  to  the  x  and  y  coordinates. 

B.  Data  Sample  I 

The  initial  21  points  1814-834)  appear  to  lie  in  a  straight  line  in  Figure  1  and 
were  selected  as  the  first  data  sample.  This  data  is  presented  in  Figure  2  and  Table  3. 

(1)  Successive  differences: 

The  first  and  second  order  successive  differences  are  also  presented  in 
Table  3.  For  the  x  component,  all  the  first  differences  are  negative  and  the  second 
differences  appear  random  (except  possibly  for  the  tail  of  the  sample  where  a  sequence  of 
four  pluses  occur  including  one  value  (A  ^  i  7=17.2)  which  is  large  enough  so  that  it  might 
indicate  an  outlier).  The  alternating  signs,  (-,  +,  -  or  +,  -,  +)  are  not  present  so  an  isolated 
outlier  does  not  appear  likely. 

For  the  y  component,  all  the  first  order  successive  differences  are  positive 
and  the  second  order  differences  appear  somewhat  fandom.  Again,  A  0  ^  =  -13.2 
indicates  that  something  has  occurred  in  the  vicinity  of  t^.  Higher  order  differences 
were  not  explored  for  this  sample. 

(2)  Least  squares  smoothing: 

Both  linear  and  quadratic  functions  were  fitted  using  the  least  squares 
method  outlined  in  Appendix  A.  The  results  are  presented  below: 
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Table  3.  Successive  Differences  —  Sample  I 


^li  ^2i  Yi  ^li  ^2i 


5228.6 

-3465.1 

-71.8 

+58.1 

5156.8 

+0.1 

-3407.0 

+2.7 

-71.7 

+60.8 

5085.1 

+2.9 

-3346.2 

+0.3 

-68.8 

+61.1 

5018.3 

-5.3 

-3285.1 

+1.8 

-74.1 

+62.9 

4944.2 

+8.0 

-3222.2 

-6.3 

-66 . 1 

+56.6 

4878.1 

-12.0 

-3165.6 

+3.2 

-78.1 

+59.8 

4800.00 

+9.8 

-3105.8 

-3.7 

-68.3 

+56.1 

4731.7 

-11.2 

-3049.7 

+6.4 

-79.5 

+62.5 

4652.2 

+9.9 

-2987.2 

-6.1 

-68.6 

+56.4 

4583.6 

-4.3 

-2930.8 

+3.9 

-72.9 

+60.2 

4510.7 

+2.4 

-2870.5 

-0.6 

-70.5 

+59.7 

4440.2 

-2.7 

-2810.8 

+1.1 

-73.2 

+60.8 

4367.0 

+3.2 

-2750.0 

-0.8 

-70.0 

+60.0 

4297.0 

-0.9 

-2690.0 

+3.3 

-70.9 

+63.3 

4226.1 

-1.6 

-2625.7 

-8.2 

-72.5 

+55.1 

4153.6 

+2.9 

-2571.6 

+4.9 

-69.6 

x69.0 

4084.0 

+3.3 

-2511.6 

+2 . 5 

-66.3 

+62.5 

4017.7 

+17.2 

-2449.1 

-18.2 

-49.1 

+44.3 

3968.6 

+5.1 

-2404.8 

+3.4 

-44.0 

+47.7 

3924.6 

-12.6 

-2357.1 

+0.3 

-56 . 3 

+48.0 

3868.0 

-2309.1 
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Linear 


x(t)  =  5288.3  -69.78t  Sxe  =  16.73 

y(t)  =  -3518.1  +  58. 72t  S  =  8.33 

ye 


Quadratic 

x(t)  =  5318.6  -77.67t  +  0.3588t2  Sxe  =  11.62 

y(t)  =  -3532.0  +  62.33t  -  0.l642t2  S  =  6.30 

J  ye 


The  residual  deviations: 

/s 

exi  ’  xi  -  x(V 
eyi  *  »i  -  y<V 

are  shown  in  Figure  3.  Note  that  there  is  a  definite  trend  in  these  residuals  starting  about 
time  t^g.  Note  also  the  general  trend  of  the  residuals  with  a  small  random  pattern 
superimposed  on  a  curve  for  each  residual  set.  Higher  order  polynomials  could  be  used  to 
remove  the  general  curve  (this  was  not  explored).  Note,  further,  that  no  violent  outliers 
are  indicated.  The  fitted  linear  function  is  shown  in  Figure  2  and  the  observed  and 
predicted  values  for  x^  and  yj  are  presented  in  Tables  4a  and  4b  together  with  the  residuals 
in  these  components  and  the  deviation 


d.  = 
i 


/ 

V 


2  2 

e  .  +  e  . 
xi  yi 


The  sequences  of  signs  observed  in  Table  4a  for  the  e^'s  and  ev^’s  are  of 
interest.  There  is  a  sequence  of  -'s,  followed  by  a  sequence  of  -’s,  and  ending  with  a 
sequence  of  +rs  for  the  exj's.  Similarly,  there  is  a  sequence  of  -’s,  followed  bv  a  sequence 
of  +'s,  and  ending  with  a  sequence  of  -’s  for  the  e  .'s.  (The  sign  of  e  ,Q  can  be  ignored  or 

y*  y® 

changed  since  the  magnitude  of  e  0  is  small.) 

y8 
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Figure  3.  Least  square  residuals  — sample  I. 
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Table  4a.  Linear  Regression  -  Sample  I 


*i 

Xi 

ic(tj) 

exi 

*i 

?(V 

eyi 

di 

1 

5228.6 

5218.5 

+10.1 

-3465.1 

-3459.4 

-5.7 

ll.fi 

2 

5156.8 

5148.8 

+8.0 

-3407.0 

-3400.7 

-6.3 

10.2 

3 

5085.1 

5079.0 

+6.1 

-3346.2 

-3342.0 

-4.2 

7.4 

4 

5018.3 

5009.2 

+9.1 

-3285.1 

-3283.2 

-1.9 

9.3 

5 

4944.2 

4939.4 

+4.8 

-3222.2 

-3224.5 

+  2.3 

5.3 

6 

4878.1 

4869.7 

+8.4 

-3165.6 

-3165.8 

+0.2 

8.4 

7 

4800.0 

4799.9 

+0.1 

-3105.8 

-3107.1 

+1.3 

1.3 

8 

4371.7 

4730.1 

+1.6 

-3049.7 

-3048.4 

-1.3 

2.1 

9 

4652.2 

4660.3 

-8.1 

-2987.2 

-2989.6 

+2.4 

8.5 

10 

4583.6 

4590.6 

-7.0 

-2930.8 

-2930.9 

+0.1 

7.0 

11 

4510.7 

4520.8 

-10.1 

-2870.5 

-2872.2 

+1.7 

10.2 

12 

4440.2 

4451.0 

-10.8 

-2810.5 

-2813.5 

+3.0 

11.2 

13 

4367.0 

4381.2 

-14.2 

-2750.0 

-2754.8 

+4.8 

15.0 

14 

4297.0 

4311.4 

-14.4 

-2690.3 

-2696.0 

+5.7 

15.5 

15 

4226.1 

4241.7 

-15.6 

-2626.7 

-2637.3 

+9.6 

18.3 

16 

4153.6 

4171.9 

-18.3 

-2571.6 

-2578.6 

+7.0 

19.6 

17 

4084.0 

4102.1 

-18.1 

-2511.6 

-2519.9 

+8.3 

19.9 

18 

4017.7 

4032.3 

-14.6 

-2449.1 

-2461.2 

+12.1 

19.0 

19 

3968.6 

3962.5 

+6.1 

-2404.8 

-2402.4 

-2.4 

6.6 

20 

3924.6 

3892.8 

+31.8 

-2357.1 

-2343.7 

-13.4 

34.5 

21 

3868.0 

3823.0 

+45.0 

-2309.1 

-2285.0 

-24.1 

51.1 
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1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 


Table  4b.  Quadratic  Regression  -  Sample  I 


xi 

exi 

yCtj) 

evi 

di 

5228.6 

5241.3 

-12.7 

-3465.1 

-3469.8 

+4.7 

13.5 

5156.8 

5164.7 

-7.9 

-3467.0 

-3408.0 

+1.0 

8.0 

5085.1 

5088.8 

-3.7 

-3346.2 

-3346.4 

+0.2 

3.7 

5018.3 

5013.6 

+4.7 

-3285.1 

-3285.3 

+0.2 

4.7 

4944.2 

4939.2 

+5 .0 

-3222.2 

-3224.4 

+2.2 

5.5 

4878.1 

4865.5 

+12.6 

-3165.6 

-3163.9 

-1.7 

12.7 

4800.0 

4792.5 

+7.5 

-3105.8 

-3103.7 

-2.1 

7.8 

4731.7 

4720.2 

+11.5 

-3049.7 

-3043.8 

-5.8 

12.9 

4652.2 

4648.6 

>3.6 

-2987.2 

-2984.3 

-2.9 

4.6 

4583.6 

4577.8 

+5.3 

-2930.8 

-2925.1 

-5.7 

8.1 

4510.7 

4507.6 

+3.1 

-3870.5 

-2366.2 

-4.3 

5.3 

4440.2 

4438.2 

->-2.0 

-2810.8 

-2807.6 

-3.2 

3.8 

4367.0 

4369.5 

-2 . 5 

-2750.0 

-2749.4 

-0 . 6 

2.6 

4297.0 

4301.5 

-4.5 

-2690.0 

-2691.5 

+1 . 5 

4.7 

4226.1 

4234.2 

-8.1 

-2626.7 

-2633.9 

+7.2 

10.8 

4153.6 

4167.7 

-14.1 

-2571.6 

-2576.7 

+5.1 

15.0 

4084.0 

4101.9 

-17.9 

-2511.6 

-2519.7 

+8.1 

19.7 

4017.7 

4036.7 

-19.0 

-2449.1 

-2463.2 

+14.1 

23.7 

3968.6 

3972.3 

-3.7 

-2404.8 

-2406.9 

+2.1 

4.3 

3924.6 

3908.7 

+15.9 

-2357.1 

-2351.0 

-6.1 

17.0 

3868.0 

3345.7 

+22.3 

-2309.1 

-2295.4 

-13.7 

26.2 
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These  sign  sequences  would  ordinarily  indicate  that  the  next  higher  order 
polynomial,  a  quadratic,  should  do  well  in  reducing  the  residual  errors.  This  is  not 
substantiated;  however,  as  Table  4b  demonstrates.  The  deviations  in  this  table  have  four 
sequences  of  the  same  sign  and  suggest  that  even  a  cubic  polynomial  will  not  necessarily 
produce  an  excellent  fit  to  the  data— this  was  not  explored  further. 

An  alternative  to  using  higher  order  polynomials  is  the  reduction  in  sample 
size.  This  alternative  was  explored  for  the  sample  with  n=ll.  The  results  are  shown 
below: 


Linear 

Quadratic 

Sample  Points 

s  s 

xe  ve 

S  S 

xe  ve 

814-824 

3.3  2.0 

-  - 

819-829 

2.9  1.9 

2-1  L8 

824-834 

IS. 4  9.5 

—  — 

829-839 

13.9  11.1 

—  — 

The  three  basic  causes  for  residuals  are: 

(a)  maneuver  of  object  tracked  (this  is  represented  by  the  polynomial), 

(b)  noise  in  measurements,  (this  is  represented  by  <r  of  which  Se  is 

an  estimate),  and 

(c)  outliers  (these  will  be  discussed  later  in  this  report). 

It  is  assumed  that  there  are  no  outliers  in  Sample  I.  Subsample  2  (points  819 
to  829)  appears  to  be  fitted  quite  well  by  a  straight  line  and  the  quadratic  was  applied  to 
give  an  estimate  of  the  size  of  <r  .  The  first  subsamples  (points  314  to  824)  are  fitted 
reasonably  well  by  a  straight  line  so  the  quadratic  was  not  tried.  The  last  two  subsamples 
have  substantially  larger  Se's.  This  could  be  caused  by  either  torpedo  maneuvers  or  a 
larger  noise  component  (larger  <r  )— this  was  not  explored. 
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C.  Data  Sample  II 

The  second  sample  selected  for  study  was  the  set  with  times  867  to  387. 
These  21  points  appear  to  present  a  curved  path  which  might  possibly  be  fitted  by  a 
quadratic.  First,  consider  the  successive  differences  in  Table  5.  Some  difficulty  similar 
to  an  outlier  is  indicated  in  the  vicinity  of  tj  =  6  (t-  =  872).  Examination  of  the  first 
successive  differences  shows  a  drop  in  velocity  between  t^  and  tg  and  only  partial 
recovery  between  tg  and  t^.  One  possible  explanation  would  be  an  additional  data  point 
between  t^  and  tg.  The  actual  explanation  is  the  inadvertent  introduction  of  a 
measurement  from  a  different  array  taken  at  time  t^  and  entered  as  the  meaurement  at 
tg.  Measurements  at  t,,  and  subsequent  times,  should  be  shifted  to  respective  preceding 
times. 


Instead  of  fitting  all  of  Sample  II,  eleven  points  (872-882)  were  selected 
somewhat  aribitrarily  for  fitting  by  least  squares— these  are  plotted  in  Figure  4.  The 
second  differences  all  have  the  same  sign  and  the  third  differences  are  small  and  have 
apparently  random  sign.  The  least  squares  straight  line  fit  is  presented  in  Table  6a  and 
sketched  in  Figure  4.  (Note  the  shift  in  the  time  scale).  This  was  introduced  to  reduce 

the  magnitudes  of  the  numbers  calculated  in  determining  the  fitted  line  and  S  .  In  dealing 

—  1  —  1  e 
with  the  quadratic,  the  means  x  =  j-y  2  Xj  and  y  =  -y^  2  y^  were  also  subtracted  from 

each  observation  Xj  and  y^,  respectively,  for  the  same  reason.  Table  6b  presents  the 

quadratic  regression.  The  reduction  in  the  Se's  is  dramatic  as  would  he  expected  from 

Figure  4.  All  of  the  ej's  are  less  than  5  and  hence  within  the  residual  noise  that  could  be 

expected  with  a  <j  of  2  or  3.  The  signs  of  the  exj's;  however,  show  some  indications  of 

lack  of  randomness.  For  this  reason,  a  third-degree  polynomial  was  tried  for  the  Xj's  only. 

This  produced  the  value  Sxg  =  0.946  with  the  maximum  magnitude  of  any  ex-  being  1.2. 

The  cubic  fits  the  data  very  well  indeed. 

D.  Data  Sample  HI 

The  third  sample  selected  for  study  involved  an  S-shaped  maneuver  as 
indicated  by  the  21  points  (848-868)  shown  in  Figure  5.  The  x  and  y  coordinates  of  these 
points  are  presented  in  Figure  6  where  it  is  evident  that  first  and  second  order  polynomials 
will  not  provide  acceptable  fits  to  the  data.  A  third-order  polynomial  appears  possible  for 
the  y.'s  and  a  fourth  order  for  the  x.'s.  A  subset  of  11  points  (851-861  or  points  4-14  in 
Figure  6  and  Table  7)  will  be  used  for  illustration. 
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Table  5.  Successive  Differences  -  Sample  n 


ti  x5  Ax  A2  A3  yj  At  A2 


1 

2012.0 

+  18.0 

2 

2030.0 

+26.1 

+8.1 

3 

2056.1 

+34.9 

+8.8 

4 

2091.0 

+43.2 

+8.3 

5 

2134.2 

+8.1 

-35.1 

6 

2142.3 

+40.9 

+32.8 

7 

2183.2 

+  58.6 

+17.7 

8 

2241.8 

+63.7 

+5 . 1 

9 

2305.5 

+71.6 

+7.9 

10 

2377.1 

+  74.5 

+2.9 

11 

2451.6 

+82.2 

+7.7 

12 

2533.8 

+85.8 

+3.6 

13 

2619.6 

+88.1 

+2.3 

14 

2707.7 

+91.9 

+3.8 

15 

2799.6 

+92.0 

+0.1 

16 

2891.6 

+95.7 

+3.7 

17 

2987.3 

+95.9 

+0.2 

18 

3083.2 

+94.3 

-1.6 

19 

3177.5 

+98.7 

+4.3 

20 

3276.2 

+93.8 

-4.9 

21 

3370.0 

-1255.5 

+94.2 

-1161.3 

-3.1 

+0.7 

-1070.2 

+91.1 

-3.0 

-0.4 

-982.1 

+88.1 

-1.1 

-43.4 

-895.1 

+87.0 

-107.6 

+70.9 

-915.7 

-20.6 

+120.3 

-15.1 

-816.0 

+99.7 

-22.0 

-12.6 

-738.3 

+77.7 

-9.2 

+2.8 

-669.8 

+68.5 

-2.9 

-5.0 

-604.2 

*65.6 

-10.0 

+4.8 

-548.6 

+55.6 

-4.8 

-4.1 

-497.8 

*50.8 

-7.6 

-1.3 

-454.6 

+43.2 

-8.4 

+1.5 

-419.8 

+34.8 

-8.0 

-3.7 

-393.0 

+26.8 

-8.4 

+3.6 

-374.6 

+18.4 

-10.6 

-3.5 

-366.8 

+7.8 

-7.1 

-1.8 

-366.1 

+0.7 

-11.1 

+5.9 

-376.5 

-10.4 

-12.8 

-9.2 

-399.7 

-23.2 

-3.3 

-26 . 5 

-426.2 

A3 


+0.1 

+1.9 

-106.5 

+227.9 

-142.3 

+12.8 

+6.3 

-7.1 

+5.2 

-2.8 

-0.3 

^0.4 

-0.4 

-2.2 

+3.5 

-4.0 

-1.7 

+9.5 
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Table  6a.  Linear  Regression  -  11  Points  (872-882) 


Xi 

'x(tj) 

exi 

-5 

2183.2 

2148.5 

+34.7 

-4 

2241.8 

2229.7 

+12.1 

-3 

2305.5 

2310.9 

-5.4 

-2 

2377.1 

2392.1 

-15.0 

-1 

2451.6 

2473.2 

-21.6 

0 

2533.8 

2554.4 

-20.6 

1 

2619.6 

2635.6 

-16.0 

2 

2707.7 

2716.8 

-9.1 

3 

2799.6 

2798.0 

+1.6 

4 

2891.6 

2879.2 

+12.4 

5 

2987.3 

2960.4 

-*-26 . 9 

X  (t)=  2554. 4+81. 19t 
Sxe  =  20.33 


Yi 


-816.0 

-762.0 

-54.0 

64.2 

-738.3 

-716.5 

-21.8 

24.9 

-669.8 

-671.1 

+1.3 

5.6 

-604.2 

-625.7 

+21.5 

26.2 

-548.6 

-586.2 

+31.6 

38.3 

-497.8 

-534.8 

+37.0 

42.4 

-454.5 

-489.4 

+34.8 

38.3 

-419.8 

-443.9 

+24.1 

25.7 

-393.0 

-398.5 

+5.5 

5.7 

-374.6 

-353 . 1 

-21.5 

24.8 

-366.1 

-307.6 

-58 . 5 

64.4 

<<> 

II 

1 

534. 8+45. 43t 

=  36.41 

ye 
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Table  6b.  Quadratic  Regression  -  11  Points  (872-882) 


Xi 

^t.) 

exi 

*i 

'yftj) 

eyi 

di 

-5 

2183.2 

2179.3 

+3.9 

-816.0 

-817.8 

+1.8 

4.3 

-4 

2241.8 

2241.8 

-0.2 

-738.3 

-738.9 

+0.6 

0.6 

-3 

2305.5 

2308.8 

-3.3 

-669.8 

-667.4 

-2.4 

4.1 

-2 

2377.1 

2379.7 

-2.6 

-604.2 

-603.3 

-0.9 

2.8 

-1 

2451.6 

2454.7 

-3.1 

-548 . 6 

-546.7 

-1.9 

3.6 

0 

2533.8 

2533.9 

-0.1 

-497.8 

-497.6 

-0.2 

0.2 

1 

2619.6 

2617.1 

+2 . 5 

-454.6 

-455.9 

+1.3 

2.8 

2 

2707.7 

2704.5 

+3.2 

-419.8 

-421.6 

+1.8 

3.7 

3 

2799.6 

2796.0 

+3 . 6 

-393.0 

-394.8 

+1.8 

4.0 

4 

2891.6 

2891.6 

0.0 

-374.6 

-375.5 

-0.8 

0.8 

5 

2987.3 

2991.3 

-4.0 

-366.1 

-363.5 

-2.6 

4.8 

/x(t)  =  2533. 

9+81 . 19t+2. 

057t2 

^(t)  = -497 

. 6+45  .43t-3 

.  724t 
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The  results  of  fitting  third-degree  polynomials  to  these  11  points  is 
presented  in  Table  8  and  the  fourth-degree  polynomial  in  Table  9.  The  cubic  equation  fits 
the  y  component  quite  well,  but  even  the  quartic  equation  leaves  something  to  be  desired 
(smaller  Sg)  for  the  x  component.  Higher  order  polynomials  were  not  tried.  The  estimates 
Se  for  cr  obtained  by  fitting  polynomials  to  the  subsample  of  11  points  are  presented 
below: 


Order  of 

Polynomial 

X 

Y 

1 

66.8 

94.5 

2 

37.3 

42.6 

3 

34.0 

3.5 

4 

9.3 

Improvement  in  fitting  the  y  component  by  increasing  the  order  of  the 
polynomial  is  quite  dramatic  but  the  improvement  is  considerably  slower  for  the  x 
component.  The  third-order  polynomial  could  be  considered  acceptable  for  y  but  a  fifth- 
order  polynomial  should  be  tried  for  x.  The  order  of  polynomial  used  does  not  have  to  be 
the  same  for  both  components. 

E.  Discussion 

Only  one  in-water  run  was  examined  and,  for  it,  only  selected  sections  of  the 
torpedo  path  were  treated  in  any  detail.  Nevertheless  some  conclusions  can  be  made 
about  application  of  the  Sequential  Differences  and  Least  Squares  Regression  techniques 
to  3-D  data. 


(l)  Sequential  differences: 

(a)  These  differences  provide  some  capability  for  locating  isolated 
outlier  points  which  differ  substantially  from  the  path  of  the  object  being  tracked.  This 
was  illustrated  in  Sample  II.  The  model  shown  in  Tables  1  and  2  needs  extension  to  higher 
order  polynomial  paths  and  multiple  outliers.  Also,  the  critical  magnitudes  for  sequential 
differences  (refer  to  Table  1)  must  be  increased  to  allow  for  accelerations  since  the  use  of 
sequential  differences  will  precede  fitting  a  polynomial  and  hence  the  order  of  the  fitted 
polynomial  will  not  be  known  at  the  time.  Thus  sequential  differences  should  be  used  only 
for  a  first  screening  for  gross  outliers. 
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Figure  5.  Data  sample  III  — points  348-368. 
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Table  7.  Successive  Differences  -  Sample  III 


2949.3 

2889.3 

2828.5 
2777.0 

2702.5 
2617.8 

2524.3 
2440.0 

2385.7 

2369.8 

2395.5 

2406.1 

2373.1 

2305.3 

2216.1 

2124.7 
2049.0 

2006.8 

2002.3 
2012.0 
2030.0 


Ai  A2  A3  yt  Aj_  A2  A3 


-40.5 
-56.8 
-51.5 
-74.5 
-84.7 
-93.5 
-84.3 
-54.3 
-15.9 
+25.7 
+  10.6 
-33.0 
-67.8 
-89.2 
-91.4 
-75.7 
-42.2 
-4.5 
+9.7 
+18.0 


-10.8 

-0.7 

-23.0 

-10.2 

-8.8 

+9.2 

+30.0 

+38.4 

+41.6 

-15.1 

-43.6 

-34.8 

-21.4 

-2.2 

+15.7 

+33.5 

+37.7 

+14.2 

+8.3 


+10.1 
-22.3 
+12.8 
+1.4 
+18.0 
+20.8 
+8.4 
+  3.2 
-56.7 
-28.5 
+8.8 
+13.4 
+19.2 
+17.9 
+17.8 
+4.2 
-23.5 
-5.9 


-1364.0 

+74.4 

-1289.6 

+74.0 

-1215.6 

+56 . 5 

-1159.1 

+88.8 

-1070.3 

+37 . 5 

-1032.8 

+1.0 

-1031.8 

-44.0 

-1075.8 

-72.4 

-1148.2 

-91.7 

-1239.9 

-78.9 

-1328.8 

-91.4 

-1420.2 

-88.5 

-1508.7 

-64.4 

-1573.1 

-25.0 

-1598.1 

+16.8 

-1581.3 

+53.7 

-1527.6 

+83 . 6 

-1440.0 

+93.2 

-1350.8 

+95.3 

-1255.5 

+94.2 

-1161.3 

-0.4 

-17.9 

-17.5 

+49.8 

+32.3 

-83.6 

-51.3 

+14.8 

-36.5 

-8.5 

-45.0 

+16.6 

-28.4 

+9.1 

-19.3 

+32.1 

+12.8 

-25.3 

-12.5 

+15.4 

+2.9 

+21.2 

+24.1 

+15.3 

+39.4 

+2.4 

+41.8 

-4.9 

+36.9 

-7.0 

+29.9 

-20.3 

+9.6 

-7.5 

+2.1 

-3.2 

-1.1 
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Table  8.  Cubic  Regression  -  Sample  III  (11  points) 


*i 

Xi 

"xUj) 

exi 

^(tj) 

evi 

di 

-5 

2777.0 

2804.8 

-27.8 

-1059.1 

-1159.0 

-0.1 

27.8 

-4 

2702.5 

2680.2 

+22.3 

-1070.3 

-1066.7 

-3.6 

22.6 

-3 

2617.8 

2383.9 

+33.9 

-1032.8 

-1028.6 

-4.2 

34.2 

-2 

2524.3 

2511.8 

+  12.5 

-1031.8 

-1035.5 

+3.7 

13.0 

-1 

2440.0 

2459.6 

-19.6 

-1075.8 

-1078.3 

+2.5 

19.8 

0 

2385.7 

2423.3 

-37.6 

-1148.2 

-1147.7 

-0 . 5 

37.6 

1 

2369.8 

2398.6 

-28 . 8 

-1239.9 

-1234.7 

-5.2 

29.3 

2 

2395.5 

2381.3 

+  14.2 

-1328.8 

-1330.0 

+1 . 2 

14.3 

3 

2406.1 

2637.6 

+38.5 

-1420.2 

-1424.5 

+4.3 

38.7 

4 

2373.1 

2252.8 

+20.3 

-1508.7 

-1509.1 

+0.4 

20.3 

5 

2305.3 

2333.1 

-27.8 

-1573.1 

-1574.5 

+1.4 

27.8 

/x(t)=2423.3-29.812t+5.827r-.649308t3 

/y(t)=-1147.7-79.73t-8.761t2+1.5271t3 


Table  9.  Quartic  Regression  -  Sample  in  (11  points) 


Xi 

^(t;) 

exi 

-5 

2777.0 

2774.0 

+3.0 

-4 

2702.5 

2711.0 

-8.5 

-3 

2617.8 

2614.8 

+3.0 

-2 

2524.3 

2516.9 

+7.4 

-1 

2440.0 

2439.1 

+0.9 

0 

2385.7 

2392.5 

-6.8 

1 

2369.8 

2378.1 

-8.3 

2 

2395.5 

2386.6 

+8.9 

3 

2406.1 

2398.4 

+7.7 

4 

2373.2 

2383.7 

-10.6 

5 

2305.3 

2302.3 

+3.0 

'x(t)=2392.4-29.812t+16.533t2-.6943t3-.428234t4 
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(b)  Sequential  differences  also  provide  some  indication  of  the  order  of 
polynomial  that  will  be  required.  One  indicator  is  the  number  of  sign  changes  that  occur 
on  the  successive  differences  of  a  particular  order.  If  there  are  few  sign  changes,  then  a 
non-random  effect  is  indicated  and  a  higher  order  polynomial  will  be  indicated.  Thus,  for 
example,  in  Sample  II  the  11-point  data  subset  shows  a  long  sequence  of  +'s  for  the  A  2j's> 
but  no  such  sequence  (indicating  randomness)  for  the  A  ^'s.  Hence,  a  third  order 
polynomial  can  be  expected  to  provide  some  improvement  over  a  second-order 
polynomial.  This  type  of  information  may  be  difficult  to  incorporate  into  a  data 
smoothing  algorithm,  but  even  some  simple  procedure  can  be  of  help  in  reducing  the 
computational  load. 

(2)  Sample  Size: 

(a)  Although  it  is  possible  that  a  sample  of  21  points  could  be  fitted 
with  acceptably  small  Sg  in  some  instances  (the  quadratic  was  not  tried  on  Sample  II),  it 
would  appear  that  smaller  samples  (e.g.,  n=ll)  will  allow  fitting  the  data  with  a 
reasonably  low-order  polynomial.  The  size  n=ll  is  not  sacrosanct  but  will  leave  some 
room  for  elimination  of  outliers  and  so  seems  to  be  a  reasonable  size. 

(3)  Least  squares  smoothing: 

(a)  By  its  nature,  the  estimate  Se,  for  the  standard  deviation  <r  of 
the  measurement  noise,  is  monotone  decreasing  as  the  order  of  the  polynomial  increases. 
(An  n-1  order  polynomial  should  be  able  to  fit  n  points  exactly  so  that  Se  would  be  zero.) 
The  appropriate  order  polynomial  is  one  which  reduces  Sg  to  the  level  of  the  noise  in  the 
measurements.  This  may  vary  with  the  path  and  the  array  making  the  measurements.  For 
the  portions  of  the  path  examined,  it  is  suspected  that  <rv  is  less  than  <rx  since  Syg  is 
generally  smaller  than  Sxg  for  a  given  order  polynomial.  The  decision  to  use  a  higher- 
order  polynomial  to  fit  a  set  of  data  depends  upon  the  value  of  Sg  obtained  for  a  given- 
order  polynomial.  If  SR  is  small  (3  or  4),  then  higher-order  polynomials  cannot  be  expected 
to  give  much  improvement.  The  extent  to  which  Sa  can  be  reduced  will  depend  upon  the 
component  as  well  as  the  polynomial  degree. 


(4)  Outliers: 


(a)  In  addition  to  rough  screening  for  outliers  by  sequential 
differences,  there  is  additional  screening  that  can  be  performed  using  residual  errors  after 
a  polynomial  has  been  fitted  to  the  data.  Outliers  contributed  substantially  to  Sg  and  the 
two  basic  techniques  of  reducing  Sg  are  elimination  of  points  with  large  residuals,  or 
increasing  the  order  of  the  polynomial. 

(b)  Elimination  of  outliers  using  residuals  after  smoothing  can  be 
accomplished  in  two  ways: 

(1)  by  confidence  intervals— a  residual  greater  in  magnitude  than 
some  specified  multiple  (3  or  larger)  of  Sg  can  be  considered  to  be  a  outliers,  and 

(2)  by  variance  reduction— the  ratio  of  Se’s  before  and  after 
removal  of  a  point,  or  points,  with  substantial  residuals  can  be  used  as  a  basis  for  the 
decision  on  whether  to  remove  the  points.  For  example,  if  Sg  (after)/Sg  (before)  ^  r, 
then  the  points  should  be  removed  (Grubbs'  criteria).  The  value  of  r  is  in  the  range  0.0  to 
1.0  and  could  be  changed  depending  upon  the  magnitude  of  S  . 

(5)  Sampling  rate: 

(a)  The  smoothing  of  3-D  data  can  be  performed  to  provide  either  a 
parametric  representation  of  path  segments,  or  specific  information  such  as  position  and 
velocity  information,  only  at  certain  points  on  the  path.  These  will  be  callled  "oarametric 
estimation"  and  "point  estimation,"  respectively. 

(b)  To  illustrate  parametric  estimation,  consider  data  collected  at 
200  sequential  observation  times  (e.g.,  800  to  1,000  for  the  3-D  data  used  in  this  section). 
Samples  of  11  points  will  be  used.  Sample  S^  will  consist  of  points  1  through  11,  sample  S9 
of  points  10  through  20  and,  in  general,  sample  Sj  of  points  from  10(j-l)  to  lOj.  There  will 
then  be  20  samples  on  the  path.  Each  sample  of  11  points  is  to  be  fitted  by  a  Dolynomial 
of  appropriate  degree  and  the  parameters  of  the  polynomial  together  with  the  value  of  Sft 
recorded  for  the  path  segment  represented  by  that  sample.  Note  that  there  will  be  two 
points  of  overlap  between  S^  and  S0  and  one  point  of  overlap  thereafter. 
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(c)  For  point  estimation,  sequence  of  points  must  be  provided.  For 
data  consisting  of  200  points  it  may  be  considered  that  occasional  monitoring  is  sufficient 
for  points  0  to  50  and  100  to  150,  but  that  behavior  of  the  path  from  points  50  to  100 
should  be  monitored  more  often  and  behavior  from  points  150  to  200  should  be  followed 
closely.  Then  the  following  sequence  of  points  could  be  considered  reasonable: 


Points  Midpoint 


j 

in  Sj 

t. 

1 

1 

5-15 

10 

2 

25-35 

30 

3 

45-55 

50 

4 

55-65 

60 

5 

65-75 

70 

6 

75-85 

80 

7 

85-95 

90 

8 

95-105 

100 

9 

115-125 

120 

10 

135-145 

140 

11 

145-155 

150 

12 

150-160 

155 

13 

155-165 

160 

14 

160-170 

165 

15 

165-175 

170 

16 

170-180 

175 

17 

175-185 

180 

18 

180-190 

185 

19 

185-195 

190 

20 

190-200 

195 
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(d)  At  each  midpoint  time  t,  the  position  coordinate  estimates,  the 
velocity  in  these  components,  the  resultant  velocity,  and  S  •  can  be  recorded  together 

J 

with  additional  information,  such  as  acceleration  components,  if  desired.  Note  that  the 
sequence  of  20  points  suggested  above  has  substantial  overlap  of  samples  in  some  cases 
and  data  gaps  between  samples  in  other  cases.  This  was  introduced  intentionally  since 
least  squares  smoothing  produces  better  estimates  (smaller  confidence  intervals)  at  the 
midpoint  of  the  sample  when  the  fitted  curve  is  a  straight  line  (refer  to  Appendix  B). 

(e)  Parametric  estimation  could  also  be  modified  to  delete  some 
samples  (e.g.,  alternate  samples  from  t^=100  to  tj=150).  It  should  require  greater 
modification  to  achieve  the  quality  of  point  estimation  procedure  at  other  than 
parametric  sample  midpoints  when  a  straight  line  (first-order  polynomial)  is  used.  When 
higher  order  polynomials  are  required,  the  preference  for  the  best  estimate  at  midpoint  of 
the  sample  is  lost  (refer  to  Appendix  B).  Making  both  techniques  available  provides  some 
flexibility  in  data  smoothing  to  accomodate  potential  customers. 
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IV. 


A  DATA  SMOOTHING  ALGORITHM 


The  following  procedure  is  suggested  for  smoothing  3-D  data: 

Step  1:  Select  appropriate  sample  size.  (11  is  suggested  as  being  small  enough  to 

provide  some  capability  of  fitting  path  segments  of  maneuvering  torpedoes  without 
requiring  high-order  polynomials.  Some  leeway  for  dropping  outliers  is  also  provided.) 

Step  2:  Select  parameter  of  point  estimation. 

Step  3:  Select  sampling  rate.  (A  standard  rate  such  as  described  in  Section  III  E4 

should  be  provided  as  a  default  rate  for  parameter  estimation  and  the  midpoints  of  these 
samples  as  a  default  rate  for  point  estimation.) 

Step  4:  Adjust  data  for  missing  data  points.  (The  principle  applied  here  is 

minimization  of  the  effect  of  the  numbers  on  sequential  differences.  For  a  single  missing 
datum,  the  average  of  the  values  at  two  adjacent  times  will  minimize  the  second 

differences.  In  any  case,  data  supplied  in  this  step  must  be  removed  before  least  squares 
smoothing  is  applied.) 

Step  5:  Calculate  first,  second,  and  third  order  sequential  differences. 

Step  6:  Determine  approximate  polynomial  order  k.  (The  (k+1)  order  sequential 

differences  should  contain  noise  only,  and  thus,  have  random  signs.  Sequences  of  4,  or 
more,  differences  with  the  same  sign  suggest  the  presence  of  a  non-random  component  as 
does  the  occurrence  of  4,  or  fewer,  changes  of  sign.  The  presence  of  a  non-random 
component  is  going  to  be  awkward  to  identify.  If  the  second  differences  are  random,  then 
k=l.  If  the  second-order  differences  are  non-random,  but  the  third-order  differences  are 
random,  then  k=2.  If  the  third-order  differences  are  non-random  then  fourth-order 
differences  should  be  calculated  and  examined  for  randomness.  (This  examination  of 
sequential  differences  in  increasing  order  should  probably  not  be  carried  beyond  the  fifth- 
order.) 
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Step  7:  Screen  successive  differences  for  gross  outliers.  (This  must  follow 

determination  of  approximate  degree  of  polynomial  since  it  should  be  based  on  comparison 
of  magnitude  of  deviation  to  noise  only  as  indicated  in  Tables  1  and  2.  The  critical  values 
suggested  in  those  tables  should  be  increased  substantially.  Some  limit,  possibly  between 
50  and  100,  should  be  selected  keeping  in  mind  that  this  is  a  first  screening  for  gross 
outliers  and  a  second  screening  will  be  made.  Any  outliers  found  in  this  step;  however, 
will  reduce  computations  in  later  steps.  Remove  any  outliers  found  and  the  observations 
for  the  other  space  components  at  the  same  observation  time.) 

Step  8:  Check  for  polynomial  degree  compatibility.  (If  the  number  of  outliers 

removed  (r)  satisfies  the  inequality  r  +  k  ^  n-1,  where  k  is  the  degree  of  polynomial  found 
in  Step  6  and  n  is  the  sample  size  after  data  points  supplied  in  Step  4  are  removed,  then 
fitting  a  k  order  polynomial  will  be  inappropriate.  For  example,  if  r  =  4  points  are 
removed  from  a  sample  in  which  one  data  point  has  been  created  in  Step  4,  then  a 
polynomial  of  degree  5  can  be  fitted  to  the  data  without  any  residual  errors  since  there 
are  6  linear  relationships  of  the  6  coefficients.) 

Step  9;  Fit  a  polynomial  of  degree  k  to  the  data.  (The  least  squares  procedure 

outlined  in  Appendix  A  is  applicable.  At  this  step  only  S  need  be  determined  and  not  the 
'  ke 

coefficients.) 

Step  10:  Seek  acceptable  Sg.  (If  S^  is  unacceptably  large,  repeat  Step  9  with  k 

replaced  by  k  +  1.  Repeat  this  step  until  either  Sg  is  acceptable  or  a  Dolynomial  of 
degree  5  is  fitted  to  the  data.) 

Step  11:  Complete  least  squares  polynomial  fit.  (The  coefficients  for  the  polynomial 

of  degree  found  in  Step  10  are  now  needed,  and  the  residual  errors.) 

Step  12:  Second  screening  for  outliers.  (One  of  the  procedures  discussed  in  Section  III 

E3  should  be  applied  to  locate  any  outliers  not  found  in  Step  7.  Remove  the  outliers). 

Step  13:  Repeat  Steps  9,  10,  11,  and  12  until  no  more  outliers  are  found.  (The 

polynomial  obtained  will  be  used  for  smoothing  sample  data.  Note  that  the  alternative 
procedure  of  searching  residuals  for  each  polynomial  degree  to  locate  outliers  may  result 
in  removing  points  which  are  not  actually  outliers  but  legitimate  observations  for  a  higher 
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degree  polynomial.  On  the  other  hand,  the  proposed  method  could  use  a  higher  order 
polynomial  to  fit  outliers  when  a  lower  order  polynomial  should  actually  be  used.  There  is 
a  choice  of  the  type  of  misfit  that  is  acceptable.) 


Step  14: 


Record  smoothed  path.  (For  parametric  form,  if  specified  in  Step  2, 


recorded  data  includes  coefficients  of  fitted  polynomial,  Sfij  and  nj  for  each  sample  Sj 
specified  in  Step  3.  For  point  estimation  form,  if  specified  in  Step  2,  recorded  data 
includes:  time  t.  estimated  coordinates  s.  =  x(t.),  y.  =  y(t.),  and  =  z(t.),  velocity 


J 


J 


J 


J 


J 


J 


components,  S  .,  and  a  for  each  point  specified  in  Step  3.  Additional  path  information 
eJ  J 

may  also  be  specified;  e.g.,  acceleration  components.) 
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V. 


CONCLUSIONS  AND  RECOMMENDATIONS 


The  procedure  suggested  in  Section  IV  provides  a  reasonable  approach  for 
obtaining  the  information  desired  in  parts  (1),  (2),  and  (3)  of  Section  I  B.  No  attempt  has 
been  made  to  provide  the  information  in  part  (4). 

In  instrumenting  this  procedure,  several  parameters  must  be  provided: 

A.  Sample  Size  (Step  1) 

A  smaller  sample  size  of  n=7  has  been  suggested.  This  would  permit  fitting 
path  segments  contained  maneuvers  with  lower  order  polynomials,  but  is  subject  to 
greater  degredation  by  missing  data  points  and  removal  of  outliers.  Experience  on 
relative  occurrence  of  such  events  in  actual  field  data  will  be  useful  in  selecting 
appropriate  sample  size. 

B.  Choice  of  Parameter  or  Point  Estimation  (Step  2)  and  Sampling  Rate  (Step  3) 
The  desires  of  the  customers  who  will  use  the  smoothed  data  is  of  primary 

concern  here. 

C.  Specifying  Approximate  Polynomial  Order  (Step  6) 

It  will  be  difficult  to  specify  a  simple  rule  for  determining  that  the  kl  order 

st 

sequential  differences  contain  non-random  components  but  the  (k+1)  order  differences 
involve  only  random  components.  The  Theory  of  Runs  can  be  of  some  help  here  although  a 
simpler  rule  is  desirable— this  needs  further  study. 

D.  Rough  Screening  For  Outliers  (Step  7) 

A  reasonable  critical  level  for  identifying  outliers  by  sequential  differences 
must  be  established.  The  occurrence  of  an  isolated  outier  was  considered  in  Section  II  B. 
Other  potential  producers  of  large  sequential  differences  such  as  paired  outliers,  violent 
changes  in  velocity,  et  cetera,  should  be  examined  for  resultant  effects.  Identification  of 
signatures  for  such  effects  will  be  useful  in  using  sequential  differences  to  identify 
outliers. 
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E. 


Polynomial  Degree  Limitations  (Step  10). 


The  limitation  of  polynomial  degree  to  5,  or  less,  appears  reasonable  for 
samples  of  size  11.  The  possibility  of  decreasing  this  limit  to  4  or  increasing  it  to  6  or 
higher  should  be  considered.  This  may  require  more  experience  with  in-water  run  data. 
For  smaller  sample  sizes,  such  as  n=7,  reduction  of  this  limit  to  lower  polynomial  degree 
should  be  considered. 

F.  Computing  Smoothed  Path  (Step  11) 

The  pivotal  condensation  method  outlined  in  Appendix  A  can  be  simplified 
even  further  in  certain  cases  which  may  occur  frequently  enough  to  take  advantage  of 
their  commonality  in  the  computer  program.  In  particular,  when  the  samDle  consists  of 
n=ll  data  points  at  adjacent  times,  the  shift  of  the  time  origin  to  the  midpoint  of  the 
sample  produces  the  following  effects: 

(1)  coefficients  of  the  polynomial  parameters  are  the  same  in  the  normal 
equations  for  all  samples, 

(2)  only  the  last  column  in  the  pivotal  condensation  format  changes  with 

sample,  and 

(3)  the  other  columns  in  the  pivotal  condensation  format  require  only 
addition  of  a  row  and  a  column  in  each  box  when  the  next  higher  degree  polynomial  is 
considered. 

The  above  commonality  is  also  clearly  evident  in  the  vector  representation  presented  in 
Appendix  A.  The  extent  to  which  this  commonality  can  be  exploited  depends  primarily 
upon  the  rarity  of  missing  data  points  and  outliers.  Indeed,  depending  upon  requirements 
of  the  ultimate  users,  data  smoothing  could  conceivably  be  restricted  to  only  such 
samples. 


In  summary,  the  data  smoothing  algorithm  presented  in  Section  IV  appears 
reasonable,  but  there  are  several  elements  that  must  be  specified  before  it  can  be 
implemented.  Some  of  these  can  be  improved  by  further  research,  others  depend  upon  the 
quality  of  the  data  which  can  only  be  determined  by  experience  with  actual  3-D  data. 
Finally,  some  of  them  can  only  be  determined  in  consultation  with  the  ultimate  users  of 
the  smoothed  data. 
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APPENDIX  A 


LEAST  SQUARES 


DATA  SMOOTHING 


A-l  LINEAR  LEAST  SQUARES  WITH  ONE  PREDICTOR 

Sample : 


(xi  y  ^ )  i  =  1^2f...^n 
Assumptions : 


is  linear ,  i .e . , 


A1  —  Actual  relationship  between  X  and  Y 
y(x)  =  a+3x 

A2  —  Abscissas  are  without  errors 

Xi=Xi 

A3  —  Ordinates  contain  measurement  or 
observations/errors 

y  .  =y  .  +  € . 

1 1  1 1  l 

6^  =  obs 
Yi’y (xt) 

Problems : 


€ ^  =  observational  error 


Fit  a  straight  line  to  the  data 


Engineer's  Solution: 

Let  y (x) =a+bx 

ei=yi-y ( ) 

D=?  ei'=  ? (yi-a~bxi) 1 

The  coefficients  a  and  b  are  selected  to 
minimize  D  (the  sum  of  squares  of  the  deviations  of  the 
observed  y^'s  from  the  fitted  line).  Setting 


and 


A-l 


gives  the  two  equations 


na+ ( Zx^) b=Zy^ 

(Zxi)a+(Zxi  )b=Zxiyi 

Solving  these  equations  yields  the  desired 
estimates  a  and  b  for  the  parameters  a  and  8,  i.e., 

_  n  (Sx.y.)  -  (Sx.)  (sy,  ) 
n  (Zx  j 2 )  -  ( Z x | )  2 

a  =  (syj)  -bkV 


Computational  Format; 


The  following  format  uses  pivotal 
condensation  to  produce  a  and  b.  It  also  yields  D  and  hence 
the  sample  variance 


S 


2  _  1 
e  n-2 


n 

Z 


l 


without  requiring  calculation  of  the  individual  e^'s. 


n 

(Zxi) 

(2yi) 

Axx=[n(Zxi2)  -  (Zx^)  2  ]  /n 

(zXi2) 

(2xiYi) 

Axjr*(n(Zxiyi)-(Zxi)  (Zyi)  ]/n 

(^Yi2) 

Ayy  =  [n(Zyi2)  -(Zyi)  2]/n 

Axx 

Axy 

D-  A^y-Ayy  ^  *^XX 

Ayy 

S2=D/  (n-2) 

D 

^-Axy  ^Axx 

a=[  ( Syi)  -bfZx^  ] /n 

a 

b 

Statistician's  Solution: 

Statisticians  augment  the  Engineer's  Solution 
by  adding  the  following  assumptions: 

A4  —  The  observational  errors  (the  €.'s) 
are  realizations  of  independent  random  variables,  E.'s,  with 
zero  means  and  common  variance,  i.e., 

UE  =S(Ei)=0 

V  -8  (Ei-Vfr  )2]=  a 2  for  all  i. 


A- 


£ 


A5  —  The  observational  errors  are 
normally  distributed  random  variables.  This  will  be  denoted 
by 

E.  ~  N (0 ,a2 ) 

Now  let  y.  denote  the  realization  of  the 
random  variable  .  Then  1 

Yi=y(Xi)+E. 

and 

|(Yi)=y(xi) 

Further,  the  random  variable  Y^  can  be  expressed  in  the  form 

Y . =A+BX . 

l  l 

where 

nSx. Y.  -  (lx.)  (1Y.) 

B= - ^ - i - - L- 

n(£x‘ 

A=  (2Yi)-B(Exi  ) 
n 


=  PjcY  and 
A 

xx 


Note  that  A  and  B  are  linear  functions  of  the  Y^'s  and  hence  of 
the  E^'s.  It  can  now  be  shown  that 


Ua=£(A)  =a 
Ub  =  §(B)  =8 


and 


so  that  a  and  b  are  unbiased  estimators  for  a  and  8  •  The 
evaluation  of  the  variances  of  Y  (x) ,  A  and  B  is  simplified  if 
the  x^'s  are  shifted  so  that  their  mean  is  zero.  Then,  since 

£  Xj  =0 

Axx=£  xi  2 
Axy  =  £x;y  j 

b= (£x  ;  y ;  ) 

(£*?) 

a=  Zyi  =  Y 


A- 3 


n 


This  shift  in  the  x-axis  will  be  assumed  in  the  development 
which  follows. 


It  can  now  be  demonstrated  that 


aJ.=o2, 

a2  =  a2/n, 

a2  =  a2/(  2)  , 

Cov  (A,B)  =S[  (A- a)  (B-/8)  ]  =0  , 
..  1 


J  /\  /  \ 

y  (x) 


i  + 

n 


(Zx  i2) 


a2, 


and 


S 2  )  =  a2 


The  last  relationship  is  very  important  since  is  an 

unbiased  estimator  ofc2and  is  our  only  source  of  information 
on  this  parameter. 

The  assumption  of  normality  (A5)  together  with 
linearity  of  the  other  random  variables  in  the  E^'s  insures 
that  they  are  also  normally  distributed,  i.e., 

Y  -  N(y,a2)  , 

A  ~  N (a, a2/n) , 

B  -  N(/3,a2/Exi  2)  , 

Y(x)  ~  N[Y(x)  ,  (i 

^Xi 

2 

The  random  variable  (n-2)  S  gyt2  has  a  Chi-Square  distribution 
with  n-2  degrees  of  freedom  and  the  random  variables. 

Ta  =  /n  (A-a) 

Se 

T0  =  B  -6 


and 

-)  a2] 


5  q  v Cxj  2 


and 


T/\ 

y  [x] 


(Y  (x)  - y ( x)  ) 
^  *  3 


"e'  n  ’ 

a. 

have  a  Student-T  distribution  with  n-2  degrees  of  freedom. 

These  distributions  can  then  be  used  to 
establish  confidence  intervals  for  a  ,  8,  and  y  (x)  at  any  x. 

Thus,  for  example,  with  k  from  Student-T  tables  such  that 


P(-k<T<+k)  =.95 
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we  have  the  following  95%  confidence  intervals 


(a- 


) 


for  a, 

(b-kSe  /HP,  b+kSe  /Lx.  2) 

for  8,  and 

(Y(x)-kSe  4+riq-r'  Y  (x)  +kSe  4+-^TT> 

for  y  (x)  at  any  x.  It  should  be  stressed  that  the  confidence 
interval  for  <7  (x)  given  above  involves  measurements  about  the 
mean  of  x  (x=0)  .  The  general  form  for  this  confidence 
interval  is 


(y(x)-kSe 


(x-x)2 
2  (x  i  -x)* 


y (x) +kSe 


(x-x)  2 

(X(  -x)  2 


It  should  be  noted  that  the  confidence  interval  for  y  (x)  is 
shortest  for  x=x  and  increases  as  x  deviates  from  this  value. 


mathematical 


A  sketch  of  the  situation  can  help  clarify  the 
elements  involved. 


y  jx|  =  a+Sx  =  actual  linear  relationship 

y  |x|  =  a+bx  =  fitted 

^  =  observational  error 

e^  =  fitting  error 

e(x)  =  prediction  error  at  any  x 


y 
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LINEAR  LEAST  SQUARES  WITH  TWO  PREDICTORS 


Sample : 


(x  .  ,  x  .  ,  y  .  )  i=  ,2, . . . ,n 
11  21  i 

Assumptions : 

A1  —  y(x) =a0+a1Si+a2x2  x  =  (x1,x2) 

A2  — 

A3  —  yi=y(x1)+€i 

Engineers'  Solution: 


x2i=xii 


xi’  (xli'x2i> 


Let 


y(x)sa  +  a  x  +a  x 
J  0  11  2  2 

D=Ee.-  2  =  E  (y  -a  -a  x  .-ax.) 
1  w  0  111  2  21 


Minimizing  D 


i^_=0 ,  Ifi-o, 


3a 


3a 


3a 


0  1  2 

produces  the  normal  equations 

na  o+  (  Ex  i  i )  a  i  +  (Exzi)a2  =  {  Eyt  ) 

(ZXli)a0  +  {SXli2)ai  +  (  SX  1  i  ^  2  i  }  3  2  =  <  1  i  y  i  > 

(ZX2i)a0  +  (ZXli  2i)ai  +  <ZX2i2)a2  *  (ZX2.yi> 
which  can  be  solved  for  a  ,  and  a2  in  terms  of  sample  data. 

Solving  (1)  for  aQ  gives 

a  o*  [  (  Ey^  -  (  Ex  ^  )  a,  -  (  Ex  zi  )  a  2]  /n 

f 

Substituting  (1  )  in  (2)  and  (3)  gives 
A  i  ia  i+  A  1 2a  2=A  1  v 


(1) 

(2) 

(3) 


(i) 


(2’) 


A  a  ,+A  a  =A  , 
1  2  1  2  2  2  2 


(3’) 
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(2  "  ) 


Solving  (2*  )  for  a  l  gives 

ai=(Ai4-a2Ai2)/Aii 

and  substituting  in  (3*  )  gives 


where  the  coefficients  will  be  defined  in  the  computational 
format  which  follows.  Equations  (3M  ),  (2M  )  and  (l'  )  can  be 
used  to  determine  the  values  of  a a ,  ax,  and  a2 

COMPUTATIONAL  FORMAT 


n 

Exli 

Zx2Li 

Ex2i 

Zxux2 

Zx2i2 

2v. 

*  1 

i  Z  x2  2  yi 
Sx2iyi 
Zyi2 

A.  =[n(2x..xk.)-(Zx  •)  (Zx.  .)]/n 

(Ey^H/n 

Ayystn(Syi2)-(2yi)2]/n 

B22  =  [Ai iA22-Ai 2  2 ] /A i i 

V 

A12 

A2  2 

Aiy 

Azy 

Ayy 

V^xAy^Ay^n 

Byy= [Ax  1 Ayy“A xy  2 ] /Ax ! 

De=[B22Byy-B2y2]/B22 

s22 

B*y 

Byy 

Se!-  nV  -  ei2=De/(n-3> 

De 

a2  2y/® 1 1 

S ! - tAl y“a2Al 2 ] /A1 X 

a0  =  [(£yi)-a2  (  X  2i)  -at  (ZxL  i)  ]/n 

ao 

ai 

a2 

se2 

Staticians ' 

Solution 

Assumptions  A4  and  A5  lead  to  the  following 
random  variables  and  their  distr iubtions 


E  =  observational  error  in  y  at  (x^x^ 
-  N(0,a2) 

Y(  x1  x^ )  =  y  (  x,  x2)  +E 
~  N  (  y  (xlrx2  )  ,  a2  ) 

A2_B2y/B22 
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A  =  [A  -A  A  ]/A 
l  iy  212  li 


s  a,, 


2  2 


1  /a  a  -a2  / 

'  11  11  12 

Ao  =  [ ( 2y  ^ ) -A i 2Xj£ -A 2  Ex ] /n 

-  N(a0,aVn) 


Also 


Cov  (A0,A1)=Cov  (AOfAz)=0 
Cov  (A,,AJ=  -A  ,  ■, 


A i ,Azl-A' u 


Then  for  a  predicted  value  Y(X1,XZ)  at  any  point  (xt»xa)  we 
have 


2  - 

rr  /%  "* 

y 


'  1  J 

Au  .. 

1  v  l  _0 

(  a12  \ 

/  A,  , 

[n  +l 

)  xi  ^ 

U.Bn 

xi  xz  + 

V^l  l  ®2  l 

This  together  with 


U$=  E  (y)  *y  (x,  ,x2  ) 
and  the  fact  that 


?(Xx  ,X2  )  -  N(y-  ,a*) 

can  be  used  to  establish  confidence  intervals  for  y(xlfx 2) 

CAUTION:  In  deriving  these  formulas  it  was 

assumed  that  =  x2  =0.  For  data  in  which  this  shift  has  not 
been  made,  the  formulae  must  be  adjusted. 


Quadratic  Model 

The  quadratic  mode 

2~  cu+a ! x+azxz 

can  be  transformed  into  a  linear  model  with  two  predictors 
by  the  transformation 


/ 


x2=xJ 
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x  l  =x 


A- 3 .  LINEAR  LEAST  SQUARES  WITH  THREE  PREDICTORS 


Sample: 


(x  .  , 

x  2i  'x  3i  ' 

Xj.)  i  =  l, 

2  f »  •  » f  n 

Assumptions 

• 

Al  — 

y(x! 

,x2 ,x3) = 

a0+ciiX1+a2X2+ajXj 

A2  -- 

xi  =  X 

i  i=l,2 

/  3 

A3  — 

*i  = 

y (xt ,x< 

Xj) +ei 

Computational  Format 

Zx  . 
i  i 

lx  . 

2  1 

Zx3i 

Zy. 

Auv“ln(Zxuixvi)”(Zxui) (Zxvi)]/n 

z*\i 

Zx  .x  • 

11  2  1 

Exilxii 

Xli^i 

u,v=l,2,3 

Zx  •  2 

2  1 

ZX2  ^X3  ^ 

Zx2iYi 

Auy=[n(Zxui^i)_(Zxui)  (V)]/n 

2x3  ^ 

ZX3  i  x 

Vi 

Buv=(AllAuv"AiuAiv)/Al1 

kl  x 

Ax  2 

hx  3 

A‘y 

^2  2 

^2  3 

A, 

2y 

Cuv= (®2  2Buv"B 2Ub  2V) /b2  2 

A3  3 

a3 

3y 

A 

yy 

D9=(C33Cyy-C3y2  )/C33 

b2  2 

B2  3 

b2 

y 

B33 

Bsy 

SA=De/tn-4)=  n_4  2 

Byy 

C33 

C3y 

A3=C3y/C33 

cyy 

De 

A2  =  (B2y-a3B2y)/B22 

a  3  se 


ao  ai 


a2 


2 


?(x 


1 


X 


x  ) =a  +a  x  +a  x  +a  x 

3  0  1  1  2  2  3  3 


a  =  ( A  -a  A  -a  A  ) /A 
1  1  y  3  13  2  12  11 

3  J  EX  ,  EX  2i‘a  ,  EX  li  >  /n 


Statistics 

y(x  ,x  ,x  )  =A  +A  x  +A  x  -tA  x 

J  1  2  3  0  1  1  2  2  3  3 

=Prediction  Equation 

It  can  be  seen  that  the  A.'s,  and  hence 

$(x  ,x  ,x  )  are  normally  distributed.  Determining  their  means 
and*  variances  is  quite  mathematically  involved  and  will  be 
delayed  until  the  vector  solution  is  considered. 


Cubic  Polynomial 


y(x)=a  +ct  x+a  x2+ct  x3 
1  0  12  3 

Transformation 


x  =x  /  x  =  x 2  t  x  -  x3 

1  2  3 

y(x)=ot  +a  x  +ot  x  +2  x 

1  0  1  1  2  2  3  3 


A-ll 
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LINEAR  LEAST  SQUARES  WITH  k  PREDICTORS 
Sample  Data: 


i>  i=1' 

•  •  •  f  n 

Linear 

Model: 

Y=a  +a  x  +. . . 
0  1  1 

+ctkxk 

Prediction: 

y=a  +a  x  +. . . 
0  1  1 

+akxk 

Computational  Format 

Zx:i 

Zx2i  *•* 

£x.  . 
ki 

£*i 

Zx  li  2 

Zxii  X2i-' 

■•Zxii  xki 

Zxii  Yi 

Zx2i  . 

,.Ix2ixki 

■H 

>« 

•H 

04 

X . 

IX! 

Zx  2 
^Xki 

Sxkiyi 

Syi2 

A 

1  i 

A  ... 

1  2 

Aik 

Ai  y 

A  ... 

2  2 

A2k 

a2  y 

Akk 

Aky 

Ayy 

A2  . 2  2  .  .  -  - 

.  A2  .  2k 

A2 . 2 » 

.  A 

A 

a2  .33 

A2  .  3k 
• 

a2  •  3  y 

• 

^2  •  V  k 

^2  •  y  y 

A-12 


3*33 


3  *  3  k 


3  •  3  r 


A 


A 


A 


A 

3  •  k  y 


"k*  k  k  ^k  •  k  y 

A  k  .  1  y 

D 

A  A  A  A  SP2  = 

01  k-l  ■> 


De/ (n-k-1) 
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LINEAR  LEAST  SQUARES  IN  VECTOR  FORM 


A- 5 


,  .  . .  ,x.  )  .  Let 
i=l , . . . , n .  This 
matrix  "jc-  where 


This  will  be 
the  sample 
data  can  be 


presented  for 
data  be  (x 


u 


k 

x 


presented  as  a 


predictors  (x-, 
)  with1 
and  a 


where  x  =1.  The  linear  model  then  takes  the  form 
o 


y=y  ( x 


i 


x  )  =a'  x= 


a .  x . 
3  3 


where  aT  denotes  the  row  vector  which  is  the  transpose  of  a ,  i.e. 

ct  -  (a  ,a  ,•••,&<) 
o  1  k 

The  fitted  equation  are 

k 

y=y(x  , ) =a  x=  s  a.x 
i  i=0  11  D 

where  the  a.'s  are  established  to  minimize 
3 

D=£e  ^ 2 


with 


ei=V'- 


y--y(x  i ,.fx. i)ay4-  2  a.x*. 

X  1  X  *  X  X  j  =  Q  J  J  1 


In  vector  form,  we  have 
e  =  y-  xa 
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so  that 


D=e  e 

The  normal  equations  (to  minimize  D)  are 

xT  x  a=xf  y 

with  the  solutions 

a=  xr? 

Expressing  the  coefficients  in  terms  of  random 
variables,  we  have 

&=  {xr  “xT1  x1? 

where 

Yr=(Y  ,  .  .  .  ,Y„)  =  (Y+E)t  =Yt  +Et 
i  n 

with 

E  =(E  ,  .  . . ,E  ) 

i  n 

y  =(?  M..»yn)  =<xa) 

-4  — i 

Y=y+E 

using 

§ (E) =0 
g(E  E  )=I"2 

where  I  is  the  nxn  identity  matrix,  we  have 
g(Y)=Y 

g($Y  )=I<r2+yyT 

4 

and  hence  the  covariance  matrix  for  Y  is 
Cov  (Y,Y  ' )  =  £(YY  r)  -5? '  =I(t2 
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Then 


E(A)  =  (x  x)  1  x  |(Y) 

«•***-* -i  -c 

=  ( x  x)  x  xa=a 


Thus  a  provides  unbiased  estimates  for  the  elements  of  a. 

For  the  variances  and  covariances  of 
coefficients  we  have 


Cov  (A, A  )  =  ( x  x)  1  a2 

Finally,  for  Y  at  any  x  we  have 

g(Y)=y 


and 


a^=xT  Cov  x 


=  x  ( x  x)  xcr 


-1  Irr  2 


the 
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A  P  E  N  D  I  X  B 


SAMPLE  LEAST  SQUARES  CALCULATION 


B-l. 


STRAIGHT  LINE  REGRESSION  FOR  SAMPLE  H 


SUM 


i 

'i 

x.' 

l 

t. 

l 

Xj 

A 

Xj 

exi 

1 

872 

2183.2 

-  5 

-371.2 

-405.9 

+  34.7 

2 

873 

2241.8 

-  4 

-  312.6 

-  324.7 

+  12.1 

3 

874 

2305.5 

-3 

-248.9 

-  243.5 

-  5.4 

4 

875 

2377.1 

-  2 

-  177.3 

-162.3 

-15.0 

5 

876 

2451.6 

-  1 

-  102.8 

-  81.2 

-  21.6 

6 

877 

2533.8 

0 

•  20.6 

0.0 

-  20.6 

7 

878 

2619.6 

1 

65.2 

81.2 

-  16.0 

8 

879 

2707.7 

2 

153.3 

162.4 

-  9.1 

9 

880 

2799.6 

3 

245.2 

243.6 

+  1.6 

10 

881 

2891.6 

4 

337.2 

324.8 

+  12.4 

11 

882 

2987.3 

5 

432.9 

406.0 

+  26.9 

0 

0.4 

0.4 

0.0 

28098.8 


-  1 

x=nZx.=2554.44 


xrxrx 


n=ll  Et .  =0 

Et?  =110 


Ex . =0 . 4 
Et.xi=8 , 931. 2 
1x^  =  728,868.12 


A  =110 
i  i 


Aix=8,931.2 
A  =7  28 ,868.11 

A 
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Aee=3719 . 62 


A  =0.04  A  -81.19  S2e=Aeg/^-2)=413.29  Se=20.33 

X (t) =a  +  a  t=0. 04+81. 19t 
o  1 

B-2.  QUADRATIC  REGRESSION  FOR  SAMPLE  H 


1  ,i=ti 

t  .=t.2 

2i  l 

Xi 

A 

Xi 

exi 

-  5 

25 

-  371.2 

-375.1 

+  3.9 

-  4 

16 

-  312.6 

-312.4 

-  0.2 

-  3 

9 

-  248.9 

-  245.6 

-  3.3 

-  2 

4 

-  177.3 

—  174.7 

-  2.6 

-  1 

1 

-  102.8 

-  99.7 

-  3.1 

0 

0 

-  20.6 

-  20.5 

-  0.1 

1 

1 

65.2 

+  62.7 

+  2.5 

2 

4 

153.3 

+150.1 

t  3.2 

3 

9 

245.2 

+241.6 

+  3.6 

4 

16 

337.2 

+337.1 

+  0.1 

5 

25 

432.9 

+436.8 

-  3.9 

SUM  0 

110 

0.4 

0.3 

0.1 

n=ll 

St  •  =0 
ii 

St  .=110 

2l 

SX. =0.4 

l 

St  a  2=110 

st  . t  .=o 

11  21 

St  -X. =8,931. 2 

Hi 

St  .  2  =  19 58 

21 

St  .X. =1769. 2 
21  1 

2=728,868.12 
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A  =110 

A  =0 

A  =8,931.2 

l  l 

1  2 

lX 

A  =858 

A  =1765.2 

2  2 

2  X 

A  =728,868.11 

X  X 

A  =858 


2,2  2 


A  =1765.2 

2,  •iX 

A  =3719.62 

2,  XX 


Aee=87 . 999 


a  =-  20.53  a  =81.19 
J  i 


a  =2.057 
2 


S2e=Aee/8=11.00 


X (t) =-20. 53+81. 19 t+2. 057 t 2 


Se  =  3 . 32 


3-3 
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CONFIDENCE  INTERVALS  FOR  ESTIMATES 


In  Appendix  A,  it  was  shown  that  the 
confidence  intervals  for  x(t)  at  any  time  t  had  the  form 


where 


(X  ( t)  -C  j  (t)Se,X+Cj  (t)Se) 


i  =1,2 


C1(t)=  k 


n 


t2 


C  (t)  =  k  \  -  +  --A-2-2 -  t .  2-2  ,  A 

2r  n  Ai  1 A2  .  2  2  1  l  A 


A12  )t.(t  -t)+fr-^i — )(t  2  -t 

11^2-22/  1  2  2  \flllA2'22  /  2  : 


are  the  appropriate  terms  for  the  linear  and  quadratic 
regression  curves,  respectively.  For  a  95%  confidence  level 
and  n-2  or  n-3  degrees  of  freedom  for  the  Student-T 
distribution  we  find  kj,  =1.833  and  k  2=1.860.  Introducing  the 
numerical  values  determined  in  the  preceding  sections  of  this 
appendix,  we  find 


C  ( t) =1.833 
i 


n 

:2 (t) =1.860 


Ho 


IT 


t2  (t2  -10)  2 
110  858 


The  relationships  of  C_ (t)  and  C2(t)  and  the  increments  St  C  (t) 
and  s2ec2(t)  are  shown‘below  using  SiS=20.33  and  S„e=3.32.e 


t 

Cl  (t) 

C?  (t) 

Sie  Ci  (t) 

S?e  G 

0 

.553 

.847 

11.24 

2.81 

+ 

1 

.580 

.820 

11.79 

2.72 

+ 

2 

.654 

.765 

13.30 

2.54 

+ 

3 

.762 

.776 

15.49 

2.58 

+ 

4 

.891 

.981 

18.11 

3.26 

+ 

5 

1.034 

1.417 

21.02 

4.70 
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The  confidence  interval  for  x(t)  is  shortest  at 
t=0  (the  sample  midpoint)  for  the  linear  regression.  To  find 
the  value  of  t  in  the  quadratic  regression  for  which  the 
confidence  interval  is  shortest,  consider 


z 


1_ 

11 


+ 


t2 

110 


+ 


(t2-10) 2 

858 


now 


1  2 ( t2 -10) 

110  858 


=  0 


220  tz=2200-858=1342 


t2  =6 . 10 


t=2 . 47 

C2  (2.47) =0.7535 


B-5 
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